Reinforcement Learning with Policy Constraints

نویسندگان

Sebastian Thrun

Jamieson E. Schulte

چکیده

This paper addresses the problem of knowledge transfer in lifelong reinforcement learning. It proposes an algorithm which learns policy constraints, i.e., rules that characterize action selection in entire families of reinforcement learning tasks. Once learned, policy constraints are used to bias learning in future, similar reinforcement learning tasks. The appropriateness of the algorithm is demonstrated in two domains: A grid world domain and a (more challenging) light control problem for commercial office space. Submitted to ICML-98

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonconvex Policy Search Using Variational Inequalities

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose p...

متن کامل

Reinforcement Learning in Robotics: Applications and Real-World Challenges

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges fac...

متن کامل

RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning Sy...

متن کامل

Call Admission Control in Wireless Ds-cdma Systems Using Reinforcement Learning

THAI) สาขาวิชาวิศวกรรมโทรคมนาคม ลายมือช่ือนักศึกษา ปการศึกษา 2549 ลายมือช่ืออาจารยที่ปรึกษา PITIPONG CHANLOHA : CALL ADMISSION CONTROL IN WIRELESS DS-CDMA SYSTEMS USING REINFORCEMENT LEARNING. THESIS ADVISOR : ASST. PROF. WIPAWEE HATTAGAM, Ph.D. 95 PP. ABSTRACT (ENGLISH) DIRECT-SEQUENTIAL CODE DIVISION MULTIPLE ACCESS (DS-CDMA)/ CALL ADMISSION CONTROL/ REINFORCEMENT LEARNING/ ACTOR-CRITIC REINFO...

متن کامل

Constrained Policy Optimization

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Reinforcement Learning with Policy Constraints

نویسندگان

چکیده

منابع مشابه

Nonconvex Policy Search Using Variational Inequalities

Reinforcement Learning in Robotics: Applications and Real-World Challenges

RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

Call Admission Control in Wireless Ds-cdma Systems Using Reinforcement Learning

Constrained Policy Optimization

عنوان ژورنال:

اشتراک گذاری